Clustering Using Monte Carlo Cross-Validation
نویسنده
چکیده
Finding the “right” number of clusters, Ic, for a data set is a difficult, and often ill-posed, problem. In a probabilistic clustering context, likelihood-ratios, penalized likelihoods, and Bayesian techniques are among the more popular techniques. In this paper a new cross-validated likelihood criterion is investigated for determining cluster structure. A practical clustering algorithm based on Monte Carlo crossvalidation (MCCV) is introduced. The algorithm permits the data analyst to judge if there is strong evidence for a particular Ic, or perhaps weaker evidence over a sub-range of lc values. Experimental results with Gaussian mixtures on real and simulated data suggest that MCCV provides genuine insight into cluster structure. v-fold cross-validation appears inferior to the penalized likelihood method (BIC), a Bayesian algorithm (AutoClass v2.0), and the new MCCV algorithm. Overall, MCCV and AutoClass appear the most reliable of the methods. MCCV provides the da&miner with a useful data-driven clustering tool which complements the fully Bayesian approach.
منابع مشابه
Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach
MOTIVATION Biologists often employ clustering techniques in the explorative phase of microarray data analysis to discover relevant biological groupings. Given the availability of numerous clustering algorithms in the machine-learning literature, an user might want to select one that performs the best for his/her data set or application. While various validation measures have been proposed over ...
متن کاملSiemens primus accelerator simulation using EGSnrc Monte Carlo code and gel dosimetry validation with optical computed tomography system by EGSnrc code
Monte Carlo method is the most accurate method for simulation of radiation therapy equipment. The linear accelerators (linac) are currently the most widely used machines in radiation therapy centers. Monte Carlo modeling of the Siemens Primus linear accelerator in 6 MeV beams was used. Square field size of 10 × 10 cm2 produced by the jaws was compared with TLD. Head simulation of Siemens accele...
متن کاملAn Empirical Study on the Visual Cluster Validation Method with Fastmap
This paper presents an empirical study on the visual method for cluster validation based on the Fastmap projection. The visual cluster validation method attempts to tackle two clustering problems in data mining: ( I ) to veri f y partitions of data created by a clustering algorithm and ( 2 ) to identify genuine clusters from data partitions. They are achieved through projecting objects and clus...
متن کاملSelection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models
In the mixed modeling framework, Monte Carlo simulation and cross validation are employed to develop an “improved” Akaike information criterion, AICi, and the predictive divergence criterion, PDC, respectively, for model selection. The selection and the estimation performance of the criteria is investigated in a simulation study. Our simulation results demonstrate that PDC outperforms AIC and A...
متن کاملA Mixture model with random-effects components for clustering correlated gene-expression profiles
MOTIVATION The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may...
متن کامل